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A variety of health-related data are commonly deposited into electronic health records 
(EHRs), including laboratory, diagnostic, and medication information. The digital nature 
of EHR data facilitates efficient extraction of these data for research studies, including 
genome-wide association studies (GWAS). Previous GWAS have identified numerous 
SNPs associated with variation in total cholesterol (TC), low-density lipoprotein cholesterol 
(LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG). These findings 
have led to the development of specialized genotyping platforms that can be used for 
fine-mapping and replication in other populations. We have combined the efficiency of 
EHR data and the economic advantages of the lllumina Metabochip, a custom designed 
SNP chip targeted to traits related to coronary artery disease, myocardial infarction, and 
type 2 diabetes, to conduct an array-wide analysis of lipid traits in a population with 
extreme obesity. Our analyses identified associations with 12 of 21 previously identified 
lipid-associated SNPs with effect sizes similar to prior results. Association analysis using 
several approaches to account for lipid-lowering medication use resulted in fewer and 
less strongly associated SNPs. The availability of phenotype data from the EHR and 
the economic efficiency of the specialized Metabochip can be exploited to conduct 
multi-faceted genetic association analyses. 
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INTRODUCTION 

Genome-wide association studies (GWAS) have been highly suc- 
cessful at identifying SNPs associated with a wide variety of phe- 
notypes, including lipid disorders, although such studies require 
very large sample sizes (Wilier et al, 2013). This limits their util- 
ity because of economic considerations and the need to acquire 
phenotype data from across diverse sources. These limitations 
can be minimized using data obtained from electronic health 
records (EHRs), which can be an efficient means to obtain robust 
and extant phenotype data (Pathak et al., 2012) from poten- 
tially a large number of individuals, including for metabolic traits 
(Wood et al, 2012) and genetics studies (Wood et al, 2008). 
Furthermore, this approach provides the opportunity to assess 
the relevance of genetic associations in real-world patient pop- 
ulations with selected phenotypic characteristics such as extreme 
obesity. Because lipid screening is part of standard of care testing 
and body weights are often measured, these data are commonly 
present in EHRs. The electronic nature of EHR data facilitates 
efficient extraction for research studies (Prokosch and Ganslandt, 
2009). However, the accuracy of EHR-based data depends upon 



how the data were obtained and entered and how it was extracted. 
Certain portions of the EHR are more standardized, such as 
laboratory measures. Other data, such as medications, may not 
be as straight-forward because of the complexity of coding for 
medication use. 

Large meta-analyses of GWAS have identified numerous 
genetic loci associated with variation in lipid phenotypes, includ- 
ing 39 loci for total cholesterol (TC), 22 loci for low-density 
lipoprotein cholesterol (LDL-C), 3 1 loci for high-density lipopro- 
tein cholesterol (HDL-C), and 16 loci for triglycerides (TG) (Van 
Dongen et al., 2013) as well as body mass index (BMI) (Sandholt 
et al., 2012). These loci are estimated to underlie about one- 
quarter to one-third of the genetic basis for these traits, a result 
that has motivated the search for additional loci through even 
larger GWAS studies (Wilier et al, 2013). Few GWAS have been 
conducted in populations with extreme obesity (Sarzynski et al., 
2011; Rinella et al, 2013), which may differ significantly from 
the large GWAS population-based samples in prevalence of co- 
morbidities such as dyslipidemia and use of corresponding lipid 
lowering medications. 
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Results from GWAS data have led to the development of 
specialized platforms designed to identify additional genetic loci 
as follow-up to initial analyses and to allow for finer genetic 
mapping of previously identified loci. An economical genotyp- 
ing platform, the Illumina Metabochip, was custom designed 
for analysis of traits related to coronary artery disease and type 
2 diabetes (Voight et al., 2012). As a proof-of-principle use of 
EHR data, we genotyped DNA from a cohort of individuals with 
extreme obesity, ascertained on the basis of undergoing bariatric 
surgery and on whom a rich database of EHR-derived pheno- 
type data were available (Still et al, 2013), using the Metabochip 
to evaluate associations with lipid traits. The presence of SNPs 
on the Metabochip residing in loci known to be previously 
associated with blood lipid levels enabled extension of findings 
to the context of extreme obesity. The use of EHR data and 
the Metabochip platform thus provided an effective strategy to 
test the relevance of lipid trait GWAS findings in this patient 
population. 

MATERIALS AND METHODS 

STUDY PARTICIPANTS, EHR SOURCE DATA, AND COLLECTION OF 
BLOOD SAMPLES 

Study participants were patients treated in the Geisinger Clinic 
Center for Nutrition and Weight Management who met clinical 
inclusion and exclusion criteria for bariatric surgery and were 
prospectively recruited into a research program on obesity from 
2004 to 2012. The Geisinger Health System is an integrated health 
care delivery system that serves residents in central and north- 
eastern Pennsylvania that includes the Geisinger Clinic, a network 
of 37 community-based primary care practices that provide care 
to over 400,000 patients. All sites have used the EpicCare™ EHR 
since 2001 (Allen-Ramey et al, 2013). 

Data used for this study were obtained from an obesity 
database based on the EHR as previously described (Wood 
et al., 2012). Source data included patient demographics, clini- 
cal measures, problem list based on ICD-9 codes, medical history, 
medication history, and lab results. Blood drawn for lipid mea- 
surements and DNA isolation was obtained as part of a standard 
of care phlebotomy performed during the pre-surgery period, 
which consisted of a 6-12 month program during which a com- 
prehensive medical history was obtained, a physical exam con- 
ducted, body weight, waist circumference, and height measured, 
and disease-specific, standard of care laboratory tests obtained, 
including fasting TC, HDL-C, LDL-C, and TG. Clinical data were 
recorded in the EHR. Blood for DNA isolation was transported 
to the research laboratory for processing and storage. Genomic 
DNA was isolated from patient whole blood samples as pre- 
viously described (Chu et al., 2008), arrayed into microplates, 
and transported to the University of Maryland Translational 
Genomics Laboratory for Metabochip genotyping. The research 
was approved by the Geisinger Clinic and Penn State Hershey 
Institutional Review Boards and all participants provided written 
informed consent. 

GENOTYPING AND GENOTYPE CLEANING 

A total of 1851 samples was selected for genotyping using the 
Illumina Metabochip. The Metabochip array consists of ~200,000 



SNPs that include: (1) "replication" SNPs corresponding to vali- 
dated associations; (2) a set of 63,450 SNPs that were the most 
significantly associated with over 20 traits related to coronary 
artery disease or T2D, including lipids, (3) SNPs previously asso- 
ciated with BMI and waist circumference, as well as 122,241 
SNPs to fine-map these loci; and (4) 16,992 other SNPs selected 
for a variety of reasons, including those that reached genome- 
wide significance in any GWAS (Voight et al., 2012; Shah et al., 
2013). Genotyping of the Metabochip was performed as per 
the manufacturer's protocol. A total of 196,725 were polymor- 
phic. Samples that had call rates across all SNPs of <95% were 
removed, leaving a total of 1827 samples (Supplementary Table 
1). Eight of the 1827 samples were excluded due to missing phe- 
notype data. After excluding samples discordant for reported and 
genetically determined sex, unresolvable duplicates, and samples 
related to another sample (Supplementary Methods), the remain- 
ing number of subjects available for the analysis set was 1686 
(Supplementary Table 1). 

A series of analyses were also conducted to identify potentially 
problematic SNPs. Starting with the 196,725 polymorphic SNPs, 
we identified a total of 6279 problematic SNPs (Supplementary 
Methods) that were excluded, with the final cleaned dataset con- 
sisting of a total of 190,446 SNPs of which 63,134 SNPs had minor 
allele frequencies <0.01. 

ASSOCIATION ANALYSIS 

Statistical association testing between individual SNPs and lipid 
phenotypes was conducted under an additive model by regressing 
the genotype score (coded as number of copies of the refer- 
ence allele) against the outcome lipid variable. Age and sex were 
included in the model as covariates. Further analyses addressed 
the issue of use of lipid-lowering agents (see Results). Our ini- 
tial aim was to assess associations with 21 SNPs present on the 
Metabochip previously associated with lipid levels in prior meta- 
analysis of GWAS results (Kathiresan et al, 2008). For these 
analyses, we regarded a p-value of 0.0024 (0.05/21) to be statisti- 
cally significant. We additionally assessed associations of 2 1 SNPs 
previously associated with body mass index and waist circumfer- 
ence (Wilier et al, 2009; Speliotes et al, 2010; Sandholt et al, 
2012), regarding a p-value of 0.002 (0.05/21) to be statistically 
significant. 

We performed a secondary analysis to assess associations of 
all Metabochip SNPs with lipid and body weight traits in which 
we adjusted for the total number of SNPs tested, defining the 
significance cut-off as p < 2.6 x 10~ 7 after Bonferroni's cor- 
rection (p = 0.05/190,446). We estimated that our final sample 
size of 1686 individuals provided 80% power to detect SNPs 
explaining 2-2.5% of the variation in lipid or BMI levels at this 
significance level. 

RESULTS 

COHORT CHARACTERISTICS 

The demographic, anthropometric, and lipid profiles of the 
population (Table 1) were characteristic of a bariatric surgery 
cohort (Wood et al., 2012). Over 99% of the population was 
Caucasian/European ancestry. Just under 46% of study subjects 
reported taking one or more lipid-lowering medications, the 
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Table 1 | Demographic and laboratory data. 



Trait Female Male Total 

(n= 1365) (n = 321) (n= 1686) 





Mean 


Stdev 


Mean 


Stdev 


Mean 


Stdev 


BMI (kg/m 2 ) 


46.5 


8.1 


48.6 


9.1 


46.9 


8.3 


WaistCir (inches) 


50.0 


12.5 


56.8 


15.1 


51.3 


13.3 


TG (mg/dl) 


162.7 


106.2 


190.6 


145.5 


168.0 


115.0 


TC (mg/dl) 


184.8 


46.7 


168.4 


51.1 


181.8 


48.0 


HDL-C (mg/dl) 


46.9 


13.7 


38.4 


12.6 


45.3 


13.9 


LDL (mg/dl) 


104.3 


38.4 


88.6 


42.5 


101 .4 


39.7 


TC/HDL-C 


3.8 


2.1 


4.1 


2.7 


3.9 


2.2 



majority taking statins (Supplementary Table 2). The percent of 
patients with a diagnosis of hypertension was 48.7%. The diagno- 
sis of type 2 diabetes was 35.2% (Supplementary Table 2), which 
was reflected in the concomitant use of diabetes medications 
(Supplementary Table 2). 

ASSOCIATION OF SNPS AT KNOWN LIPID LOCI WITH LIPID LEVELS 

Results from association testing of previously identified lipid loci 
(Kathiresan et al., 2008) are shown in Table 2. Of the 21 lipid- 
associated SNPs tested, 12 were nominally associated with one or 
more lipid traits at ap < 0.05, including 3 that remained signifi- 
cant following adjustment for multiple comparisons (p < 0.002). 
The loci marked by these three SNPs included GCKR (associ- 
ated with TG levels at 5.3 x 10~ 4 ), LPL (associated with HDL-C 
levels at 1.4 x 10~ 5 ), and CETP (associated with HDL-C levels 
at 4.1 x 10~ n ). The directions of the observed effects for all of 
the 12 SNPs nominally or significantly associated with lipid levels 
were directionally consistent with those previously reported. 

ADJUSTMENT FOR LIPID-LOWERING MEDICATIONS 

A significant proportion (46%) of subjects in this cohort were 
being treated with lipid-lowering medications. Medication use 
was not associated with levels of LDL-C (beta = — 2.74; p = 0.12) 
or TC (beta = 2.70; p = 0.19), but was significantly associated 
with levels of HDL-C (beta = -2.78; p = 1 x 10~ 6 ) and TG 
(beta = 0.22; p = 1 x 10~ 17 ). The observed values TC, LDL-C, 
and TG levels would likely have been higher, and HDL-C lev- 
els lower, had they not been taking lipid-lowering medications. 
We therefore considered three additional analytic approaches to 
accommodate the effect of the medications. Our first approach 
was to repeat the association analyses after removing all sub- 
jects on lipid-lowering medications (final n = 945). Our sec- 
ond approach was to include use of lipid-lowering medications 
as a covariate (medication user vs. non-user) in the regres- 
sion model (final n = 1686). Our final approach was to restrict 
analysis to subjects taking lipid-lowering medications (final n = 
741). Results of the association analyses of SNPs at known lipid 
loci using all three approaches to address the use of lipid-lowering 
medications are shown in Table 2. Results obtained from anal- 
ysis restricted to subjects not taking lipid-lowering medications 
were generally consistent with those obtained from the ini- 
tial analysis of the entire cohort. With only a few exceptions 
(e.g., rs6544713 near ABCG8), the effect sizes at most loci 



tended to be of the same magnitude, although the p-values 
tended to be less significant in the sample with medication- 
users removed, consistent with a smaller sample size. The same 
trend, i.e., comparable effect sizes but lower statistical signifi- 
cance, was also observed when analyses were restricted to sub- 
jects taking lipid-lowering medications. Inclusion of medica- 
tion use as a covariate in the model had virtually no effect 
on the genotype-lipid phenotype association at any of the 
tested SNPs. 

Array- wide association analysis of lipid levels was also carried 
out using the same three approaches to evaluate the impact of 
lipid-lowering medication use. Manhattan plots for these results 
are shown in Supplementary Figures 3-5. In these analyses, we 
detected the association of HDL-C with the CETP locus at array- 
wide significance thresholds in subjects not taking lipid-lowering 
medications and with medication use as a covariate, but not in 
the subgroup taking lipid-lowering medications (Supplementary 
Table 4). A similar result was obtained for association of LDL-C 
with the APOE locus. The association of TG with the APOA1- 
APOA3-APOA4-APOA5 locus was detected only when using 
medication use as a covariate. 

Association of SNPs at known BMI and waist circumference 
loci For BMI and waist circumference, no SNP achieved a p-value 
of less than 0.002 (Supplementary Table 5). 

ARRAY-WIDE ASSOCIATION ANALYSIS 

Following analysis of the candidate SNPs, association analysis 
was undertaken for all SNPs on the array using an additive 
genetic model for 7 phenotypes; BMI, waist circumference, TC, 
LDL-C, HDL-C, TG, and TC/HDL-C ratio. For BMI and waist 
circumference, no SNP achieved a p-value of less than 1 x 
10^ 6 (Supplementary Figures 1 and 2). 

Results of the array-wide association analyses for 5 lipid 
phenotypes are summarized in Manhattan plots shown in 
Figures 1A-E. SNPs at three loci achieved p-values at less than 
1 x 10~ 7 in association with HDL-C (Figure 1A). A cluster of 
SNPs with p-values less than 1 x 10~ 12 was identified at the 
HERPUD1-CETP locus on chromosome 16 (Keebler et al., 2009). 
All associated SNPs were in high linkage disequilibrium with 
rsl73539 (Figure 2), the peak SNP identified in this region pre- 
viously associated with HDL-C (Kathiresan et al., 2008). A cluster 
of SNPs at the LPL locus on chromosome 8 also associated with 
HDL-C levels, as has been previously reported (Heid et al., 2008). 
As shown in Figure 3, the associated SNPs were in high linkage 
disequilibrium with rsl2678919, the peak SNP previously iden- 
tified in this region associated with HDL-C (Kathiresan et al., 
2008). The third locus associated with HDL-C levels was tagged 
by only a single SNP with a p-value of 7.46 x 10~ 9 was located at 
the NPAS3 locus. 

For TC (Figure IB), the peak association occurred with mul- 
tiple SNPs on chromosome 1 at the CELSR2-PSRC1-SORT1 
locus (peak association: p < 2.6 xl0~ 7 ), which has previously 
been associated with TC in multiple studies (Lu et al, 2010; 
Ma et al., 2010). Associations at this locus were also apparent 
for LDL-C levels (Figure 1C), as has been reported in previous 
studies (Kathiresan et al, 2008; Nakayama et al., 2009), although 
the associations did not achieve the Bonferroni-corrected 
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FIGURE 1 | Manhattan plots for associations of TC, HDL-C, LDL-C, TG, and TC/HDL-C with 190,446 SNPS from the Metabochip. (A) HDL-C (B) TC. (C) 
LDL-C. (D) TG. (E) TC/HDL-C. 
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FIGURE 2 | Zoom plot showing association of SNPs in CETP region (Chromosome 16) associated with HDL-C, and linkage disequilibrium between 
associated SNPs and index SNP 
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FIGURE 3 | Zoom plot showing association of SNPs in LPL region (Chromosome 8) associated with HDL-C, and linkage disequilibrium between 
associated SNPs and index SNP 



level of statistical significance. The well-documented asso- 
ciation of LDL-C levels with APOE on chromosome 19 
(Waterworth et al., 2010), was also detected, with a p-value of 
1.56 x 10~ 12 . 



A significant association was observed between TG and the 
APOA1-APOA3-APOA4-APOA5 locus on chromosome 11 (SNP 
rs number not available; resides at position 116,156,325; see 
Figure ID) as has been previously reported (Johansen et al., 
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2011). Statistically significant associations with TC/HDL-C ratio 
were found at three loci (Figure IE), two overlapping with TC 
and LDL-C, the CELSR2-PSRC1 -SORT1 and APOE regions. 

DISCUSSION 

Large-scale genomic studies can be costly due to the large sample 
sizes required for sufficient power to detect statistically signifi- 
cant differences of alleles with small to moderate effect sizes. Both 
phenotyping and genotyping can be expensive depending upon 
what is required for phenotyping and which genotype platform is 
selected. For this study, we used phenotype data derived from a 
clinical database constructed using EHR data (Wood et al., 2012). 
The laboratory data were obtained as part of clinical standard of 
care saving on the costs of thousands of blood lipid analyses; the 
anthropomorphic and demographic data were obtained by pro- 
fessionally certified clinicians and providers remunerated as part 
of clinical care. Importantly, the data were obtained in an elec- 
tronic format allowing for the efficient construction of a flexible 
database. Despite the electronic format, careful data quality con- 
trol and scrubbing were required in order to ensure robust results. 
In particular, the recording of body weights in a cohort of patients 
with extreme obesity must be carefully curated to ensure accu- 
racy. Similarly, despite selection for high success rates, generating 
accurate SNP calls for the Metabochip platform required quite 
extensive manual curation and data cleaning. 

We successfully extended to an extremely obese cohort asso- 
ciations between SNPs identified by large meta-GWAS studies 
and lipid traits. Failure to replicate associations with many of the 
SNPs was likely due to insufficient power from the smaller sam- 
ple size in our cohort vis-a-vis the very large populations analyzed 
in previous studies. Alternatively, it is possible that some of the 
non-replicated SNPs have smaller effect sizes in extreme obese 
populations. However, the direction of effects and the effect sizes 
were similar to that previously reported by the large meta GWAS 
studies in all but two of the SNPs analyzed. 

The SNPs we associated with TC and LDL-C largely over- 
lapped. This is not surprising since the value for LDL-C is cal- 
culated using the Friedewald equation based on adjusting TC for 
HDL-C and TGs levels (subtract HDL-C and one-fifth of the TGs 
from TC). They are thus correlated values, so linear regression 
analyses will identify similar associations. Medication use was also 
not associated with either TC or LDL-C. Levels of HDL-C appear 
to be under strong genetic control, yet despite such high heritabil- 
ity (Ober et al, 2006), GWAS loci do not explain a large propor- 
tion of HDL-C variation (Wilier and Mohlke, 2012). Nevertheless, 
we replicated several known loci. One locus, LPL, has also been 
robustly associated with risk for cardiovascular disease (Deloukas 
et al., 2013). Our results indicate that this locus may therefore 
also be a risk locus for CVD in patients with extreme obe- 
sity similar to previous studies of other CVD loci (Wood et al., 
2008). Our results for HDL-C are similar to those reported for 
a bariatric surgery cohort of similar sample size in which a total 
of 60 SNPs in the ATP-binding cassette, sub-family A member 1 
(ABCA1), apolipoprotein A1/C3/A4/A5 cluster (APOA5), choles- 
terol ester transfer protein (CETP), UDP-GalNAc transferase 
2 (GALNT2), hepatic lipase (LIPC), endothelial lipase {LIPG), 
lipoprotein lipase (LPL), and the methylmalonic aciduria cblB 



type (MMAB)/mevalonate kinase (MVK) loci were genotyped 
(Sarzynski et al, 2011). Only SNPs in the LPL, LIPC, and CETP 
loci were statistically associated with pre-operative HDL-C level, 
similar to our results, although the multiple test correction factor 
was far less stringent than ours. 

We found only a single locus associated with TG that repli- 
cated from the loci reported by the Global Lipids Genetics 
Consortium meta-analysis of over 100,000 individuals comprised 
of multi-ethnic and multi-racial populations (Teslovich et al., 
2010). This is likely due to the much smaller sample size or alter- 
natively, variation at the APOA1/C3/A4/A5 gene cluster (Lai et al., 
2005) may be the only genetic locus of the previously identified 
loci that associates with TG in extreme obesity. Which gene or 
genes in the APOA1/C3/A4/A5 cluster harbors the TG influencing 
variant is not known. 

A limitation of our study is that many of our subjects were 
on lipid lowering medications that may have masked our abil- 
ity to identify genetic associations. A priori, lowering (or raising 
in the case of HDL-C) lipid levels through medication use may 
be expected to disproportionately occur in subjects with dyslipi- 
demia due to a genetic predisposition, thus decreasing the ability 
to identify lipid-genotype associations. To address this issue, we 
performed three complementary analyses, including removing all 
subjects on lipid-lowering medications, the strategy employed by 
large meta-GWAS (Kathiresan et al., 2008). This predictably led to 
a major loss of statistical power, perhaps acceptable for very large 
sample sizes but not for our cohort. The approach of adjusting for 
medication use as a covariate had virtually no effect on the sensi- 
tivity of detecting SNP-lipid associations. However, this approach 
is biased by the indication for medication being high lipid levels, 
thus any identified associations between genotype and lipid levels 
are not independent of medication use. 

Another potential limitation is that some patients may not 
have been fasting for a sufficient length of time prior to the 
blood draw to avoid an artifactual dietary effect on blood lipid 
measurements. For example, triglyceride levels are particularly 
sensitive to prandial state and other influences (Yuan et al., 2007). 
About 25 patients had triglyceride levels greater than 500mg/dl, 
considered the highest category of hypertriglyceridemia by the 
Adult Treatment Panel III of the National Cholesterol Education 
Program (Expert Panel on Detection, Evaluation, and Treatment 
of High Blood Cholesterol in Adults, 2001). Hypertriglyceridemia 
may be expected to have a higher prevalence in populations 
with extreme obesity. In addition, assuming that the probability 
of non-fasting was independent of genotype, one would expect 
that measurement error due to non-fasting would obscure gene- 
lipid associations, not create false positives. The replication of 
associations at known loci supports the utility of using clinical 
samples. 

No SNPs were found to be significantly associated with 
either BMI or waist circumference. The SNPs selected for 
the Metabochip included those from GWAS from the Genetic 
Investigation of Anthropometric Traits (GIANT) consortium, 
which studied anthropometric traits BMI and waist circumfer- 
ence. A total of 18,211 SNPs from 24 loci (Expert Panel on 
Detection, Evaluation, and Treatment of High Blood Cholesterol 
in Adults, 2001; Brahm and Hegele, 2013) found to be associated 
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with BMI, plus 5055 replication SNPs, were included on the 
Metabochip, along with 1374 SNPs and 1048 replication SNPs 
from 2 loci associated with waist circumference (Yuan et al., 
2007). These studies involved multiple cohorts each with a mean 
BMI of about 27-28 kg/m 2 with a standard deviation of less 
than 5 kg/m 2 . The proportion of these cohorts with a BMI of 
greater 40 kg/m 2 is thus likely less than 5-10%, with a relatively 
limited range of BMIs of less than 25 kg/m 2 . The cohort with 
extreme obesity studied here had an average BMI of 48 ± 8 kg/m 2 , 
representing a much higher average BMI, as well as a much 
wider range in BMI. The BMI of individuals at the upper range 
of the human body weight distribution may represent a dis- 
tinct phenotype (Still et al., 2011) and harbor rarer variants 
with higher penetrance and larger effect sizes than the common 
variants interrogated by the Metabochip platform. Next gen- 
eration sequencing may be required to identify those variants 
(Gerhard et al, 2013). 

In summary, we conducted a GWAS of major lipid traits using 
EHR derived data to analyze SNPs that had previously been asso- 
ciated with lipid phenotypes, as well as other SNPs residing on 
the Metabochip, in an extremely obese cohort. Although several 
lipid loci replicated, other previously identified lipid and body 
weight loci did not. Possible differences may be due to the use of 
EHR data for phenotyping, characteristics of the cohort, and/or 
decreased statistical power. Nevertheless, the availability of extant 
EHR phenotype data and the relatively low cost of the specialized 
Metabochip can be effectively used to conduct a GWAS. 
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